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Abstract 


Photosynthetic eukaryotic cells arose more than a billion years ago through the engulf- 
ment of a cyanobacterium that was then converted into a chloroplast, enabling plants 
to perform photosynthesis. Since this event, chloroplast DNA has been massively trans- 
ferred to the nucleus, sometimes leading to the creation of novel genes, exons, and regula- 
tory elements. In addition to these evolutionary novelties, most cyanobacterial genes have 
been relocated into the nucleus, highly reducing the size, gene content, and autonomy of 
the chloroplast genome. In this chapter, we will first present our current knowledge on 
the origin and evolution of the plant plastome in the different Archaeplastida lineages 
(Glaucophyta, Rhodophyta, and Viridiplantae), focusing on its gene content, genome 
size, and structural evolution. Second, we will present the factors influencing the rate of 
DNA transfer from the chloroplast to the nucleus, the evolutionary fates of the nuclear 
integrants of plastid DNA (nupts) in their new eukaryotic environment, and the drivers of 
chloroplast gene functional relocation to the nucleus. Finally, we will discuss how cytonu- 
clear interactions led to the intertwined coevolution of nuclear and chloroplast genomes 
and the impact of hybridization and allopolyploidy on cytonuclear interactions. 


Keywords: endosymbiosis, plastome evolution, functional gene transfer, nuclear 
integrant of plastid DNA (nupt), nucleo-cytoplasmic interactions 


1. Introduction 


Photosynthetic eukaryotic organisms harbor a chloroplast genome (also called “plastome’) within 
their cells. This genome derives from the endosymbiosis of a prokaryotic organism, which was 
then gradually converted into the chloroplast. With the increased number of sequences within 
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publicly available databases and the emergence of very sophisticated phylogenetic and phy- 
logenomic analyses, we can infer much more precisely the origin of this primary endosymbiotic 
event. In addition, these comparative analyses allow for investigation of plastome evolutionary 
dynamics in the different plant lineages and the extent of nuclear influence over the chloro- 
plast genome. Overall, plant plastomes harbor a very low gene content compared to their pro- 
karyotic ancestor, which appears to result from either gene loss due to redundant functions in 
both chloroplast and nuclear genomes or functional transfer and relocation of chloroplast genes 
into the nucleus. The relocation of thousands of chloroplast genes from the chloroplast to the 
nucleus was rendered possible due to the massive transfer of DNA from the chloroplast to the 
nucleus. However, chloroplast genes that have been integrated into the nucleus are not imme- 
diately functional and have to adapt to their new eukaryotic environment by acquiring various 
regulatory elements (i.e., promoter, polyadenylation signal, and target peptide). Despite most 
of these functional transfers occurred soon after the endosymbiotic event, some clever real-time 
experiments (using a selectable marker) have allowed for understanding how easily and by 
which molecular mechanisms DNA is transferred from the chloroplast to the nucleus. Such 
experiments have also permitted the study of the subsequent evolution of chloroplast DNA in 
the nuclear genome, and how a chloroplast gene becomes functional in the nucleus. 


2. Chloroplast origin and evolution 


Photosynthetic eukaryotic cells arose through the engulfment of a cyanobacterium that was then 
converted into the chloroplast, enabling plants to use sunlight to fix carbon. This major func- 
tional innovation allowed for eukaryotes to transition from heterotrophy to autotrophy. This pri- 
mary endosymbiotic event is at the origin of the astonishing biodiversity visible today in plants, 
including the Glaucophyta, Rhodophyta, and Viridiplantae lineages (Figure 1). With the advent 
of next-generation sequencing technologies, the number of fully sequenced plastomes has hugely 
expanded, providing insight into chloroplast evolution in the different plant lineages. In this part, 
we will present our current knowledge on chloroplast origin and what has been unraveled on the 
chloroplast genome evolution, regarding genome size, gene content, structure, and mutation rate. 


2.1. Primary endosymbiosis event and origin of chloroplasts 


The first hypothesis of the endosymbiotic origin of chloroplasts is commonly credited to 
Russian botanist K. Mereschkowsky, who observed similarities between cyanobacteria 
and chloroplasts of plants and algae [1]. This hypothesis was then reaffirmed by Margulis 
in the 1970s. The origin of this primary endosymbiosis event is still debated. While fossil- 
based phylogeny estimated the origin of chloroplasts to be around 1.4-1.7 billion years ago 
[2], gene-based approaches dated it around 0.9 billion years ago [3]. Different phylogenetic 
analyses aimed at determining the cyanobacterial lineage from which the chloroplast was 
derived and revealed that chloroplasts were closely related to the nitrogen-fixing cyanobacte- 
ria Chroococcales, Nostoc sp., and Anabaena variabilis [4, 5]. 


It is now widely accepted that this primary endosymbiotic event has a single origin [6-8]; how- 
ever, it is still unclear how long it took for the conversion of the bacterial endosymbiont into a 
fully integrated organelle. This transition from endosymbiont to organelle surely involved many 
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Figure 1. Phylogenetic relationships of the different plant lineages formed after the primary endosymbiosis of a 
cyanobacterium by an ancestor of the Archaeplastida. The number of available genomes on GenBank is indicated 
under the image. For simplicity, “Mosses, Marchantyophytes and Bryophytes” on one side, as well as “Ferns and 
Lycopodiophyta” on the other side, were grouped together in the tree. Pictures copyright to L. Brient, M.T. Misset, 
R. Delourme, and J. Keller. 
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steps. The first steps corresponded to the loss of the bacterial wall and the early acquisition by the 
endosymbiont of a transport system to transfer proteins and metabolites from the cytosol to the 
chloroplast. This latter step is constituted by two protein complexes: translocon of the outer (TOC) 
membranes of the chloroplast and translocon of the inner (TIC) membranes of the chloroplast 
[9-11]. The TIC/TOC complexes allow for transportation of the pre-proteins (proteins with a cleav- 
able chloroplast target peptide) from the cytosol, where they are synthetized, to the chloroplast, 
where the target peptide is cleaved (reviewed in [11]). The presence of the same protein import 
apparatus in the different Archaeplastida lineages is the best evidence of the single origin of chlo- 
roplasts. Finally, the transition also necessitated the gradual functional transfer of endosymbiont 
genes to the nucleus [12], leading to the massive reduction of plastome size and gene content. 


2.2. Evolution of chloroplast genomes 


2.2.1. An unequal sequencing effort 


Most of our current knowledge of the conversion from endosymbiont to organelle has been 
obtained by comparing contemporary Archaeplastida organelles with their closest bacterial 
relatives. During the last few years, advances in high-throughput sequencing and bioin- 
formatic methods greatly facilitated the assembly, analysis, and publication of complete 
plastomes. To date, more than 2300 plastomes are fully assembled and deposited in the 
GenBank database. This number of plastomes actually doubled in the last 2 years. However, 
the number of sequenced plastomes varies greatly between the different Archaeplastida 
lineages. Indeed, almost 80% of them belong to Angiosperms. Thus, there is an important 
inequality in the sequencing effort. The poor level of plastome sequencing in plant lineages 
outside of the Angiosperms needs to be improved to fully understand chloroplast genome 
evolution in plants. Some efforts to fill this gap have been performed in the last 2-5 years, 
but they are still insufficient. In the Glaucophyta, only one chloroplast genome is available 
(NC_001675), and another is sequenced but not yet published (Lang et al., unpublished). In 
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Figure 2. Cumulative numbers of full chloroplast genomes deposited in GenBank for (A) Rhodophyta, (B) Chlorophyta, 
and (C) Streptophyta. 


contrast, the sequencing of Rhodophyta and Chlorophyta (green algae sensu stricto) species 
greatly improved since 2012: from less than 30 plastomes available in 2012 to around 100 in 
2017 (Figure 2A and B). 


2.2.2. Gene content evolution 


As mentioned previously, the conversion of the cyanobacterial endosymbiont into a chloro- 
plast necessitated the functional transfer or replacement of most cyanobacterial genes into the 
nucleus. Compared to the thousands of genes (at least 2000) thought to have been once pres- 
ent in the cyanobacterial genome, Archaeplastida plastomes encode a maximum of around 250 
genes [13, 14]. This observation indicates that most genes (includes protein coding and struc- 
tural RNAs) present in the cyanobacterial ancestor have been functionally transferred relatively 
soon after the endosymbiotic event. Despite gene content among modern chloroplast genomes 
being relatively well conserved, there are important variations. Thus, Rhodophyta have the 
highest number of genes (237 in average; minimum 207; up to 266 in Grateloupia taiwanensis) 
compared to the Glaucophyta (195), Chlorophyta (118 in average; minimum 68; maximum 210) 
or Streptophyta (129 in average; minimum 64; maximum 313), when excluding parasitic and 
non-chlorophyll species (Table 1). 


These variations in gene content revealed the divergent evolution of plastomesin the different 
lineages. As an example, Rhodophyta gene content is characterized by the complete absence 
of the NADPH dehydrogenase complex [15]. Conversely, some genes are Rhodophyta- 
specific or rare in other Archaeplastida such as RNase P RNA, tmRNA, or signal recognition 
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Average number of structural RNAs 43 35 35 40 45 42 39 45 30 


Table 1. Plastome numbers and characteristics (average size, number of proteins, and structural RNAs) among the 
Archaeplastida. The minimum and maximum genome sizes are indicated in italic. 


particle RNA [16-18]. Rhodophyta chloroplasts generally have a large genome size (see later) 
characterized by a high number of genes and other features such as the presence of bacte- 
ria-like operons, suggesting that Rhodophyta plastomes are phylogenetically closest to the 
ancestral cyanobacteria genome than any other algae [15]. Gene content variations are also 
well documented in the Angiosperm family in which multiple independent gene losses 
have been found such as infA, ycfl, rps16, and accD genes, which have been repeatedly 
lost in several lineages [19]. Within non-parasitic Angiosperms, a few families, such as 
the Fabaceae and the Campanulaceae, have recently lost various chloroplast genes [19]. 
In these lineages, recent gene losses from the plastome coincide with the transfer of those 
genes to the nucleus, providing insight into the underlying molecular mechanisms impli- 
cated in such events. For example, chloroplast gene loss may occur through a relaxed 
selective constraint on the chloroplast copy when a nuclear copy is already functional. 
This relaxation of selective constraint allows for non-sense mutations that may render the 
chloroplast copy non-functional [19, 20]. In addition, genes can become non-functional 
following the loss of their splicing capacity, as observed for rps16 [21, 22]. The plastome 
gene content reduction is even more pronounced in non-chlorophyll organisms, such as 
parasites and obligate symbionts. Among the Angiosperms, 41 plastomes from parasitic 
plants have been sequenced and showed a great reduction in gene content (with only 63 
genes) and size (around 70 kb in average), in line with the progressive loss of photosyn- 
thetic abilities. Similarly, plastome reduction is also observed among algae such as in the 
parasitic Helicosporium sp. (green algae) or Choreocolax polysiphoniae (red algae). On the con- 
trary, increase in gene number may also be observed but in a lesser extent. In Pelargonium, 
which has among the highest number of chloroplast genes in Angiosperms (more than 
180 in P. transvaalense and P. hortorum), there have been multiple duplication events in 39 
genes [23]. Despite the number of coding sequences increased in the species belonging to 
this genus, this increased number of genes was due entirely to duplications and not to neo- 
functionalization processes. 


2.2.3. Size variation 


Among plants, chloroplast genomes range from less than 100 kb to more than 1 Gb, again 
excluding the non-chlorophyll species that exhibit significantly smaller chloroplast genomes 
(Table 1). The largest chloroplast genome ever sequenced has very recently been found in the 
red algae Corynoplastis japonica. Its genome size goes up to 1 Mb and contains 209 genes [24]. 
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On average, the largest plastomes are found in the Rhodophyta with an average size of about 
183 kb (minimum = 149,987 kb, maximum = 610,063 kb, excluding the small 90 kb genome of 
the parasite of C. polysiphoniae), whereas Glaucophyta and Streptophyta have an average chlo- 
roplast genome size of between 130 and 160 kb (minimum = 107,236 kb; maximum = 242,575 kb, 
excluding the parasitic and non-chlorophyll species), respectively (Table 1). 


Several factors can explain the important size variations found among the Archaeplastida. In 
the case of the red algae C. japonica and Bulboplastis apyrenoidosa (more than 1 Mb and 600 kb 
long plastomes, respectively), the increase of plastome size is due to an expansion of the intron 
number with more than 200 introns found in these species [24]. In Angiosperms, plastome 
variations have been observed but in a lesser extent. For example, in Pelargonium that encom- 
passes species with the largest chloroplast genomes found in Angiosperms (almost 243 kb), 
increased size is correlated to the expansion of the inverted repeats (IRs) that can be as long as 
75 kb [23, 25]. This has also been observed in the Campanulaceae, Lobelia thuliniana [26], and 
Musa acuminata [27]. Expansion of plastomes has been linked to the presence of an increase 
number of repeats such as in Trifolium [28] or the Mimosoid Acacia and Inga [29]. This increase 
of plastome size by repeats is presumably the result of a less efficient chloroplast DNA repair 
mechanism [30, 31]. In contrast, plastome size reductions are also relatively common and can 
be due to loss of both coding and non-coding regions, especially in the non-chlorophyll spe- 
cies [32] that have an average plastome size of 71,736 bp in Angiosperms (Figure 2). 


2.2.4. Structural evolution 


Among plants, most plastomes seem to exhibit a conserved quadripartite structure, with a 
large and small single copy separated by two inverted repeats (Palmer 1983). However, mul- 
tiple rearrangements occurred in diverse lineages, which modified this conserved structure. 
One of the most striking examples is the loss of one IR that occurred multiple times in the dif- 
ferent chloroplast-bearing lineages, such as in the Fabaceae and the Geraniaceae [30, 33, 34]. 
This has also been reported for different Gymnosperms species such as Pseudotsuga menziesii, 
Pinus radiata, Cephalotaxus oliveri, as well as in multiple lineages of Chlorophyta [35-37]. 


Chloroplast genome structure and gene order are also highly affected by inversions. Many 
inversions have been described in the literature, especially in legumes, with, for instance, frag- 
ments of 50 kb in the Papilionoideae [38], 36 kb in the Genistoids [39]; 29 kb in Sophoreae [40] 
or 7 kb in Tylosema esculentum [41]. Multiple inversions have also been found in Geraniaceae, 
Campanulaceae (more than 40 inversions detected), and other lineages [25, 42, 43]. Inversions 
can be caused through flip-flop recombinations between repeat sequences [39, 44]. 


2.2.5. Evolution rates of plastomes 


Chloroplast genomes are known to be highly conserved, with relatively low rates of muta- 
tions, especially when compared to the plant nuclear genome. Indeed, the chloroplast genome 
evolves on average 10 times slower than the nuclear genome [45], with about 1 or less muta- 
tion/kb/million years [46] compared with approximately 7 mutations/kb/million years for the 
nuclear genome [47]. However, there are some exceptions, especially in three Angiosperm 
families (i.e., Fabaceae, Campanulaceae, and Geraniaceae) that are known to have accelerated 
evolutionary rates of their plastomes along with multiple structural rearrangements and size 
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variations [19, 28, 30, 42, 44, 48, 49]. For example, the ycf4 gene appears to be a hotspot of varia- 
tion in Lathyrus, and this gene evolves 20 times faster than the rest of the chloroplast genome 
[19]. This localized hypermutable chloroplast region evolves even faster than the nuclear 
genome. Similarly, faster evolution has been observed in the clpP gene in Mimosoid [29]. In 
Lupinus (Fabaceae), two hypervariable regions have been identified (ycfl gene and psaA-ycf4 
region) and are characterized by high numbers of indels (with length usually superior to 
20 bp) and mutations [22]. 


To sum up this first section on the origin and evolution of plant plastomes originating from the 
primary endosymbiosis event, the recent sequencing and bioinformatics progress significantly 
increased the number of chloroplast genomes available for the scientific community. These 
advances have greatly improved our knowledge about the evolutionary dynamics of plastomes. 
Despite the diversity of organisms that harbor chloroplasts, plastomes in general seem to be 
relatively well conserved among the Archaeplastida (in terms of structure, size, and gene con- 
tent); however, multiple independent alterations of these features have been observed in the dif- 
ferent lineages. In addition, a few plant families (or group of species) seem to present an atypical 
evolution of the chloroplast genome. It is certain that the continuous effort to sequence much 
more plastomes (especially in the Glaucophyta and Rhodophyta) will allow the identification of 
new examples of such atypical evolution and will permit a better understanding of what are the 
causes and the molecular mechanisms involved in limiting or increasing plastome evolution. 


3. Impact of the cyanobacterial endosymbiosis on plant nuclear 
genome evolution and origin of chloroplast proteins 


Since the endosymbiotic event, the host genome (nuclear) has acquired most of the cyanobac- 
terial genes, leading to the gradual loss of autonomy of the endosymbiont and the reduction 
of its genome. In this part, we will present our current knowledge on the mechanisms as well 
as the numerous cases of chloroplast DNA transfers to the nucleus and where it is now inte- 
grated in the nuclear genome. We will then detail the subsequent evolution and adaptation 
processes of the chloroplast genome that took place in its new eukaryotic environment. We 
will also discuss which factors can influence relocation of a chloroplast gene to the nucleus, 
and how a chloroplast gene transferred to the nucleus may become functional. Finally, we will 
discuss the important role that transfer of chloroplast DNA to the nucleus plays in the process 
of diversifying the plant nuclear gene content. 


3.1. DNA transfer from the chloroplast to the nucleus 


Much earlier than the complete sequencing and assembly of the first chloroplast genome 
(Nicotiana tabacum: [50]), Kawashim et al. [51] observed that the gene encoding the small 
subunit of the Rubisco chloroplast protein could be transferred by pollen and thus must be 
encoded in the nucleus. From this early observation arose the question of whether nuclear 
genes encoding chloroplast proteins were of eukaryotic origin or resulted from transfer of 
DNA from the chloroplast to the nucleus. The existence of DNA transfer from the chloro- 
plast to the nucleus was discovered a decade later using Southern Blot, by observing the 
presence of sequences with high homology between spinach (Chenopodiaceae) chloroplast 
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and nuclear genomes [52, 53], as well as in other closely related Chenopodiaceae species 
[54]. With the advent of the polymerase chain reaction, Ayliffe & Timmis [54] amplified and 
sequenced a chloroplast DNA sequence from N. tabacum nuclear DNA. This nuclear integrant 
of plastid DNA (also called ‘nupt’) presented more than 99% homology with its homologous 
chloroplast sequence, indicating that this chloroplast DNA fragment had been transferred 
to the Nicotiana nucleus during the last million year. Using similar techniques, these authors 
also observed that the tobacco nuclear genome contained long tracts of chloroplast DNA at 
different locations. These different nupts may be as large as the whole chloroplast genome 
(about 150 kb) and the different nupts did not consist of the same sequence homology to the 
chloroplast homologous sequence, indicating that chloroplast DNA had been transferred at 
multiple times to the nucleus during plant evolution [54]. To decipher how frequently chlo- 
roplast DNA is transferred to the nucleus, experiments using an antibiotic resistance gene tai- 
lored for nuclear expression (i.e., nuclear promoter and terminator) were performed [55, 56]. 
After introducing this selectable marker (antibiotic resistance gene) into N. tabacum chloro- 
plast genome and obtaining homoplastomic lines, it was demonstrated that DNA transfer 
occurred once in about 16,000 pollen grains [55] or once for every 5 million somatic cells [56], 
highlighting the high rate of DNA transfer from the chloroplast to the nucleus. This deluge of 
DNA transfer may be even higher in the presence of environmental stresses, such as mild heat 
[57] or cold stress [58]. It is important to note that in these experiments, the reported transfer 
rate may be underestimated as only the transfer of the selectable marker (about 2 kb) from the 
chloroplast genome could be identified. The higher rate of transfer observed in reproductive 
tissue (from pollen grains) compared to somatic cells may be explained by the higher degree 
of degradation of chloroplast DNA during pollen development (since chloroplast genomes 
are maternally inherited) than in somatic cells (more stable plastids). This hypothesis was sup- 
ported by the observation of a much lower frequency of DNA transfer from female germlines 
(about 1 every 270,000 ovules) [59]. Some of these newly transferred chloroplast sequences 
were characterized and demonstrated that integration occurred by non-homologous end join- 
ing [60] and predominantly in open chromatin [61]. Surprisingly, it has also been demon- 
strated that DNA fragments from various plastome regions may insert simultaneously at the 
same nuclear location [60]. 


3.2. Short-term and long-term evolution of chloroplast DNA transferred to the 
nucleus 


Some of the chloroplast DNA fragments that were experimentally shown to insert in the 
nuclear genome were characterized [55, 60] and were often large in size (usually greater 
than 10 kb in length). Considering the massive transfer of chloroplast DNA to the nucleus, 
one would expect that some of these nupts would be deleted to avoid a rapid increase of the 
nuclear genome size. This hypothesis was tested by studying the fate of these newly inte- 
grated chloroplast fragments [62]. Half of the lines presented an unstable inheritance of the 
nupts, after only one to two generations. Most lines presented a varying level of instability 
between the different areas of the same plant, and the loss of the nupt most often occurred 
during somatic cell division. However, it was also observed that some nupt loss occurred 
during meiosis [62]. Thus, even if constantly and massively integrated into the nucleus, at 
least some of these novel nupts may be rapidly removed and likely an even larger number 
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may be deleted over longer evolutionary time scales. Many nupts have been identified in 
various sequenced plant nuclear genomes [63-69], by fluorescent in situ hybridization using 
a chloroplast DNA probe [70] or using PCR-derived methods [71, 72]. Using the nuclear 
genome sequences of 17 plant species, the number, size, and genomic organization of nupts 
were studied [65]. They found a positive correlation between nuclear genome size, organelle 
numbers in cells, and cumulative lengths of nupts, as previously observed from a smaller 
number of plant nuclear genomes [64, 67, 73, 74]. To date, the largest identified nupt was 
found in the rice nuclear genome (131 kb) and corresponds to almost the entire chloroplast 
genome size (97.4%). A detailed analysis of the nupts presents in the rice genome revealed 
that nupts were mainly integrated within the pericentromeric regions [68]. Thereafter, they 
were rapidly fragmented, vigorously shuffled, and 80% of them were eliminated in the mil- 
lion years following their integration. Accordingly, the largest nupts were found to be the 
youngest, whereas the smallest nupts were found to be older. Most of the nupts identified in 
rice were less than 1 million years old (myo), whereas only a few were older than 5 myo. The 
recently integrated nupts were assumed to be decaying over evolutionary time into smaller 
fragments [64]. In rice, the half-lives of nupts were evaluated to be 0.5 myo for fragments 
whose length is superior to 1.6 kb and 2.2 myo for fragments with length inferior to 1.6 kb 
[68]. This result differs from those obtained experimentally in N. tabacum, where several old 
nupts (up to 6 myo) were larger than 2 kb [71]. The evolutionary fate of nupt sequences was 
scrutinized and revealed the prevalence of G:C > A:T transitions, which partly resulted from 
the deamination of methylcytosine [71]. However, over-representation of these transition 
types was similar to what was observed in the Arabidopsis nuclear genome, indicating that 
nupts evolved in a nuclear-specific manner. Similarly, the fate of potential protein-coding 
sequences and non-coding sequences presented within nupts was similar and evolved both 
neutrally, in accordance with the non-functionality of almost all nupts. 


3.3. Functional replacement of hundreds to thousands chloroplast proteins in the 
nucleus 


Following endosymbiosis, the symbiont to organelle transition involved many steps. This 
includes the loss of the bacterial cell wall, the acquisition of a protein machinery that transfers 
nuclear-encoded proteins from the cytosol to the chloroplast (also known as the TIC and TOC 
complexes [75, 76]), and finally, the functional relocation of most chloroplast genes to the 
nucleus. As detailed below, a chloroplast gene may be replaced either only after its functional 
transfer to the nucleus, or directly substituted by a gene of a mitochondrial or eukaryotic 
origin. 


Since the endosymbiosis event, thousands of genes have relocated within the nuclear genome. 
Indeed, cyanobacterial genomes encode a minimum of 2000 proteins, whereas current plant 
plastomes encode only 80-200 proteins, although 800 to more than 2000 proteins have been 
found in some algae and plant chloroplasts [77], respectively. Apart from some genes that 
presented redundant functions in both chloroplast and nuclear genomes, most chloroplast 
genes have been functionally relocated to the nucleus with their proteins targeted back to the 
organelle. Thus, the spectrum of proteins required for function and biogenesis of the cytoplas- 
mic organelle did not greatly evolve since its creation. 
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3.3.1. Functional transfer and relocation of a chloroplast gene to the nucleus 


The current plastome of most plants encodes a maximum of 200 proteins [78] whereas 
more than 2000 proteins in the chloroplast, suggesting the functional gene transfer and 
relocation of most chloroplast genes to the nucleus. As chloroplast genes are of prokaryote 
origin, they are not readily functional in the nuclear genome. To function in this novel 
environment, a chloroplast gene has to acquire or hijack nuclear gene regulatory elements 
(eukaryote promoter and terminator), as well as a transit peptide to target the protein back 
to the chloroplast [60, 79]. However, the acquisition of all these nuclear elements does not 
have to take place right after the transfer of the chloroplast gene to the nucleus, as they 
can retain their open reading frames for several million years [71]. In addition, some chlo- 
roplast genes can be relatively easily functional as a few chloroplast promoters (i.e., psbA 
and 16S rrn [80, 81]) were shown to be functional in the nucleus. Similarly, some transit 
peptides may be of cyanobacterial origin [82] and the AT-richness of 3’UTR chloroplast 
gene regions may mimic a polyadenylation signal. 


To date, the number of chloroplast-encoded proteins (about 80) is relatively well conserved 
among flowering plants. However, a few chloroplast genes have been independently lost in 
various plant lineages [19], allowing to understand how they became functional. Such chloro- 
plast gene losses were most particularly observed in the Fabaceae, for which the plastome has 
been extensively reorganized and contains localized accelerated mutation rates [19]. Some of 
these genes, such as rpl22 [83] and accD [19], have been shown experimentally to have been 
functionally transferred to the nucleus. Similarly, recent functional transfers of chloroplast 
genes, such as rpl32 [84] or infA [85], have been demonstrated. In addition, the functional relo- 
cation of infA and accD genes to the nucleus occurred several times independently [19, 85, 86]. 
Indeed, after the functional transfer of a chloroplast gene to the nucleus, two genes present 
in two different cellular compartments will encode for the same chloroplast protein. On one 
hand, the retention of the chloroplast copy is favored as the chloroplast genome evolves slower 
than the nuclear genome. On the other hand, even if the nuclear copy loses its functionality, the 
whole process can be repeated again. 


3.3.2. Functional replacement of a chloroplast gene by a gene of mitochondrial (prokaryotic) or 
eukaryote origin 


The functional replacement of a chloroplast gene does not necessarily necessitate its functional 
transfer from the chloroplast to the nucleus. In the case of the chloroplast RPS16 protein, the 
chloroplast rps16 gene has been replaced by a nuclear rps16 gene of mitochondrial origin [22, 83]. 
This nuclear rps16 of mitochondrial origin had been functionally transferred to the nucleus 
soon after the formation of the mitochondria [22], and it acquired a dual target peptide to 
transfer the RPS16 protein to both chloroplasts and mitochondria [20]. Such functional replace- 
ment is not so surprising and many more similar functional transfers may have occurred as the 
prokaryote ancestors of chloroplast and mitochondria may encode similar proteins. 


Another evolutionary mechanism enabling the functional replacement of a chloroplast gene 
may occur via the acquisition of a chloroplast transit peptide by a eukaryotic gene presenting the 
same function. Such event was observed for the chloroplast accD and the eukaryote aac genes, 
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which both encode an acetyl-CoA carboxylase. In Arabidopsis, the nuclear acc gene has been 
duplicated in tandem, and one copy has acquired a chloroplast targeted protein and thus also 
encodes a chloroplast ACCD protein [87]. 


The continuous deluge of organellar DNA to the nucleus has facilitated the functional trans- 
fer of almost all chloroplast genes to the nucleus, reducing extensively the plastome size. 
Additionally, this organellar DNA was not only used to replace organellar genes but also 
enabled diversifying the plant nuclear gene content [77]. 


3.4. Importance of chloroplast DNA transferred to the nucleus in diversifying the 
plant nuclear gene content 


Chloroplast gene sequences transferred to the nucleus may present different fates. As pre- 
sented in the two previous sections: (i) they may remain non-functional, decay, and ultimately 
be lost; (ii) they may acquire all the necessary elements to conserve the same function and 
have the protein targeted back to the chloroplast; or (iii) they may acquire new subcellular 
locations and functions. As mentioned earlier, Martin et al. [77] extrapolated that about 18% 
of Arabidopsis thaliana genes were acquired from the cyanobacterial ancestor of plastids and 
that more than half of these cyanobacterially derived proteins were not targeted to the chloro- 
plasts, suggesting either that they conserved their function but in another cellular localization 
or that they acquire a new function. These proteins are involved in many different functional 
categories that are not typically cyanobacterial, such as disease resistance and intracellular 
protein routing, indicating that they served as a rich source of genetic raw material and led 
to functional novelties. Similar analyses were performed in the glaucophyte Cyanophora para- 
doxa [88-90] and the green alga model Chlamydomonas reinhardtii [91]. Compared to what was 
observed in the flowering plant A. thaliana, only 6-7% of genes were inferred to be of cya- 
nobacterial origin. Of these genes of cyanobacterial origin, 90% were inferred to be targeted 
back to the chloroplast in C. paradoxa [88], indicating that the impact of nupts on creating 
novel genes (new function or new cellular location) varies between plant lineages. We can 
speculate that many factors could explain these differences, such as the nuclear genome size 
and its structural evolutionary dynamics. Another major evolutionary impact of nupt on plant 
proteome evolution was determined by observing that nupts can generate novel nuclear exons 
encoding proteins with a different function to the preexisting organellar coding sequence. 
Additionally, Noutsos et al. [92] found that the Ka/Ks ratios (non-synonymous substitutions/ 
synonymous substitutions) were higher than 1, reflecting a non-neutral evolution of nupts 
and their involvement into innovative functions. 


4, Cytonuclear interactions, coadaptation processes, and 
incompatibilities 


The conversion of the cyanobacterial endosymbiont into the chloroplast partly results from 
the gradual transfer of hundreds to thousands of endosymbiont genes to the nuclear host. 
Across all lineages, more than 90% of the plant chloroplast proteins are now encoded in the 
nucleus. Within the few chloroplast-encoded proteins, about 40% of them are involved in 
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chloroplast protein complexes that are made up of proteins encoded in both the chloroplast 
and the nucleus. These complexes exhibit important functions that are vital for the plant, 
such as photosystems I and II. One can only wonder how the stoichiometry between those 
two compartments is maintained. Indeed, one cell might contain hundreds to thousands of 
chloroplast copies compared to only one copy in the nucleus. Furthermore, chloroplast inheri- 
tance is often maternal, whereas nuclear bi-parental inheritance occurs in angiosperms dur- 
ing sexual reproduction. Therefore, coevolving interactions between cytoplasmic and nuclear 
genomes have been necessary and have resulted in significant coadaptation processes. When 
these fine-tuned coevolutionary interactions are disrupted, after intra-interspecific hybridiza- 
tion and/or genome doubling, for instance, incompatibilities and deleterious phenotypes can 
be observed. These evolutionary processes will be discussed in the light of previous work on 
synthetic and natural hybrids, as well as in polyploid species. 


4.1. Hybridization and cytonuclear intergenomic complexes 


Several evolutionary scenarios can explain coadaptation between chloroplast and nuclear 
genomes after intraspecific hybridization. First, cytoplasmic genomes lack sexual reproduc- 
tion and are more susceptible to fix and accumulate deleterious mutations by genetic drift [93]. 
Only positive selection for compensatory nuclear alleles will allow for regaining of opti- 
mal organelle function [94]. This mechanism of compensatory coadaptation has been shown 
in several plant species with photosynthesis dysfunction (reviewed in [95]). One of the best 
examples with detailed genetic studies comes from the genus Oenothera [96], where three 
basic haploid nuclear genomes can be associated with five different chloroplast haplotypes. 
Of the 30 possible combinations, only 12 produce a green viable phenotype, whereas the 18 
remaining associations lead to various degrees of cytonuclear incompatibilities, from reduced 
phenotypic capacity to embryo lethality [97]. Subsection Oenothera has apparently separated 
into three distinct evolutionary lineages (represent by the three basic haploid genomes A, B, 
and C) that have coevolved with chloroplast haplotypes I, III, and V, respectively [97]. Recent 
molecular work suggests that the radiation within this subsection started approximately 1 
million years ago [98]. Thus, these results suggest that, in Oenothera, cytonuclear incompat- 
ibilities and associated coadaptation mechanisms have rapidly lead to strong post-zygotic 
barriers after only 1 million years apart [99]. 


Second, some mutations in the organelles could also be adaptive in specific environments 
and fixed in the population by natural selection. Subsequently, coadaptation process may 
favor specific nuclear variants to preserve intergenomic interactions. This mechanism is 
called adaptive divergence. However, experimental studies in the genus Helianthus are giv- 
ing some hints of the effects of extrinsic selection on cytonuclear interactions. Exchange 
of the common sunflower cytoplasm with closely related species’ organelles leads, just as 
in Oenothera, to deleterious phenotypes (from altered biomass to reduced seed weight and 
pollen unviability), suggesting, again, a role of cytonuclear incompatibilities in establish- 
ing reproductive barriers between populations [100]. Additional study demonstrated the 
contrasting adaptive potential of two cytoplasmic genomes in two alternative ecological 
environments. Sambatti et al. [101] have performed reciprocal transplant experiments of H. 
annuus and H. petiolaris and all possible backcross combinations of nuclear and cytoplasmic 
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genomes into two contrasted ecological environments. The authors elegantly showed that 
each cytoplasm of H. annuus and H. petiolaris exhibits higher fitness in mesic and xeric habi- 
tats, respectively, and is therefore differentially adapted to these two contrasting habitats. 
More recently, authors have benefited from the model system A. thaliana to investigate the 
contribution of cytonuclear interactions into plant fitness variation [102]. In this study, a 
field experiment has been set with 56 different cytoplasmic lines (based on eight natural 
accessions of A. thaliana) combining the nuclear genome of one parent with the organelle 
genomes of another. Using 28 adaptive phenotypic traits (such as germination, phenology, 
and fecundity), authors showed that a large proportion of those traits are affected by inter- 
specific cytonuclear interactions. However, the genetic factors and molecular interactions 
underlying such phenomenon are still to be elucidated. 


As mentioned above, the examples for intergenomic coadaptation and incompatibilities are 
scarce, and we are still very far from unraveling the molecular processes underlying such 
interactions. Applications of genome-wide studies in association with high-throughput 
sequencing would greatly improve our understanding of cytonuclear coevolution. 


4.2. Effects of whole genome doubling and interspecific hybridization on 
cytonuclear complex stability 


As shown above, cytonuclear interactions are extremely fine-tuned coevolved molecular pro- 
cesses that are still largely understudied. However, in recent years, efforts have been made, 
especially in neo-polyploid plant species (natural and resynthesized) to better apprehend the 
consequences of whole genome duplication (WGD) and interspecific hybridization on cyto- 
nuclear interactions and stability. In this last section, we will review our knowledge on such 
systems and elaborate on the many future issues to address. 


Although completely overlooked, it is astonishing to envision the numerous and drastic con- 
sequences of a WGD event on copy number variation and stoichiometry on those cytonuclear 
complexes. Impacts of WGD on genomic structure and functional changes have been exten- 
sively studied in a large variety of plant systems. Genome redundancy can lead to changes 
in epigenetic patterns (including transposable element dynamics), altered gene expression 
(changes in global gene expression but also possible biased contribution of redundant cop- 
ies), and fractionation processes (gene loss, homologous and non-homologous exchanges). 
However, to date, very few studies have investigated how the duplication of nuclear genes 
would affect the assembly dynamics of the multi-subunit cytonuclear complexes [103]. 
Different hypotheses predict the fate of nuclear and cytoplasmic genes implicated in cyto- 
nuclear complexes. They are based on the prediction that selection will favor compensatory 
mechanisms to maintain coordinated expression between cytoplasmic and nuclear genes 
leading in fine to a functional complex. Immediate impacts of WGD could therefore lead to 
downregulation of nuclear genes and/or upregulation of cytoplasmic genes. Additionally, 
another path to achieve the same outcome would be for the cell to enhance organelle biogen- 
esis and produce a larger number of chloroplasts. This has been shown in cotton and alfalfa 
polyploids, which exhibit larger chloroplast size and higher chloroplast number per cell rela- 
tive to their diploid progenitors [104, 105]. For instance, chloroplast number in guard cells 
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is increased by 25, 72, and 102% in triploid, tetraploid, and hexaploid cottons, respectively, 
compared to diploids [105]. Consequently, it is hypothesized that larger chloroplasts could 
carry more genome copies per organelle. In maize, only chloroplast number per cell (and not 
chloroplast size) is accentuated with ploidy [106]. However, it seems that chloroplast pro- 
liferation might be more correlated to cell size than nuclear ploidy [107]. Indeed, a positive 
relationship exists between nuclear genome size and cell size [108], but the direct impact of 
WGD and presence of redundant genomes have yet to be elucidated. 


Only a handful of studies have looked at the consequences of WGD on a longer time scale, in 
that case, occurrences of subfunctionalization and pseudogenization of duplicated copies are 
to be expected. Coate et al. [109] stated that there might be a considerable influence of cyto- 
nuclear complex sensitivity to gene dosage imbalance and thus the need to return to single 
copy status or stay in duplicates. More specifically, Coate et al. [109] demonstrated that in 
Glycine max, Medicago truncatula, and A. thaliana photosystem gene families are preferentially 
retained as duplicates after WGD. This trend is likely explained by the high dosage sensitivity 
of these cytonuclear complexes. The authors hypothesized that if one of the duplicated gene 
copies implicated in the same cytonuclear complex is lost, it will cause gene dosage imbalance 
between genes, and the complex will not function properly. On the contrary, other complexes 
are apparently less affected by gene dosage imbalance and tolerate different copy numbers 
among genes (of the same complex). 


All of these processes could be enhanced through allopolyploidy, where divergent parental 
species first hybridized before genome doubling. In that case, the nuclear genome is redun- 
dant and a mixture of two, more or less, divergent parental genomes, whereas the organelles 
have (usually) a uniparental origin. Therefore, as chloroplast inheritance is usually maternal, 
selection should favor maintenance of maternal nuclear copies over the paternally inherited 
homoeolog as to preserve pre-existing coadaptive cytonuclear interactions. In allopolyploids, 
different scenarios leading to pseudogenization of paternal copies can be envisioned and 
were tested in a limited set of genes and species. The first scenario involves downregula- 
tion and relaxed selection of the paternally inherited homoeolog. An alternative scenario 
involves preferential gene conversion to the maternal homoeolog resulting in the loss of the 
paternal-like copy. It is important to note that both scenarios are not exclusive but could be 
part of a dynamic and gradual process, with first overexpression of the maternal copies lead- 
ing to paternal homoeolog pseudogenization and maternally biased gene conversion. These 
hypotheses have only been tested in the Rubisco nuclear-encoded gene rbcS in various allo- 
polyploids. In cotton, an ancient allopolyploid formed 1-2 MYA (progenitors diverged 5-10 
MYA), and it has been shown in five different allopolyploid species that putative events of 
gene conversion occur between subgenomes but not in synthetic hybrids [110]. Interestingly, 
maternal homoeologs are preferentially expressed in wild and cultivated allopolyploids as 
well as in the synthetic F1 hybrid (whereas no such bias is observed between the diploid pro- 
genitors) [110]. These patterns have been shown also in other polyploid models. Following 
the same methods, concerted evolution is reported between homoeologous genomes of 
Arabidopsis suecica, Arachis hypogaea, and N. tabacum [111]. Additionally, there is preferential 
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occurrence of maternal to paternal gene conversion in signaling and regulatory domains of 
the rbcS gene copies. In those polyploids, preferential expression of paternal homoeologs 
carrying the maternal-like gene conversions has also been described [111]. In contrast, the 
allotetraploid Brassica napus showed no sign of homoeologous exchanges or bias expression 
probably because of either recent (compared to the other models) divergence time between 
diploid parental species (only 4 MYA). In the same way, resynthesized reciprocal hybrids and 
allotetraploids formed between Oryza sativa indica and japonica (that diverged around 9000 yr. 
ago) did not exhibit biased expression of rbcS alleles or homoeologs and also no biased gene 
conversion toward maternal gene copies [112]. In Tragopogon miscellus, a very recent neo- 
allopolyploid formed only 80 years ago, homoeolog gene loss and biased expression were 
limited, occurring only in 12 and 16% of individuals coming from two naturally and repeat- 
edly formed polyploid populations [113]. However, the bias was mainly toward maintaining 
the maternal nuclear copy of rbcS (in 7 of 10 cases of homoeolog loss). Therefore, although 
parental genomes of the neotetraploid T. miscellus polyploid are quite divergent [114], very 
little evidence for functionalization and homogenization of duplicated copies is visible in the 
polyploids. This might be due to the recent formation of such polyploids (less than 100 years 
ago) and the lack of time for such events to take place. Thus, in the cases of allopolyploid 
formation, divergence between parental species and age of polyploids seems to be important 
factors driving cytonuclear coevolution processes. 


These few studies already highlight the complexity of the different model systems that can be 
highly influenced by various evolutionary processes such as pre-existing coadaptive mecha- 
nisms, natural selection, and divergence between parental individuals (different populations 
to different species). As all Angiosperms have experienced at least one round of genome 
duplication and most of them multiple WGDs (Triticum and Brassica), paleopolyploid spe- 
cies are perfect candidates to elucidate the long-term impact of diploidization and biased 
genome fractionation on rates of asymmetric gene loss and pseudogenization. Additionally, it 
seems essential to integrate plant families that have contrasted rate of chloroplastic evolution 
(such as in Geraniaceae, Campanulaceae, and Fabaceae) and paternally inherited chloroplast 
genomes (such as in Actinidia, Medicago, and most Conifers). Finally, life history features 
such as reproductive strategy (perennial vs. annual), mating system (selfer vs. outcrosser), 
population level dynamics, and effective population size will also impact fixation rate of 
mutations. 
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